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Existing interaction techniques within handheld augmented reality (AR) have 
frequently used touchscreen input (pure two-dimensional (2D) pointing and 
clicking) from the handheld device's display for target selection on the virtual 
object. However, performing accurate target selection on a distant target 
object becomes challenging as the target object will appear smaller when the 
distance increases. Aside from that, the difficulty increases in performing 
target selection when another virtual object obscures the distant virtual object. 
Therefore, this study aims to present a target selection method to perform the 
target selection. We enable the raycasting technique with real hand gesture for 
the target selection method on the occluded and distant object in handheld AR. 
The leap motion device is mounted at the back of the handheld device to track 
the real hand gesture. The markerless tracking technology of simultaneous 
localization and mapping (SLAM) is implemented to enable the AR 
environment. Based on the results, the aim of this study was achieved. 
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1. INTRODUCTION 

Augmented reality (AR) technology refers to inserting the virtual components into the actual real- 
world surroundings views which improve and enhance the senses of human perceptions in the real world [1]. 
Although at the early stage, AR was mainly developed for head-mounted displays (HMD) and desktop 
computers, with the advancement of the current technologies, handheld devices such as tablets and 
smartphones are preferred for interaction in AR due to their availability [2]-[4]. Despite marker-based tracking 
being utilized in many AR applications, recent research shows that markerless tracking techniques such as 
simultaneous localization and mapping (SLAM) are being opted instead for handheld AR. It enables the user 
to track the pose of the device, without requiring the user to set up fiducial markers or scan the environment 
beforehand [5]. 

In regard to that, one of the most essential aspects of AR applications is the interaction among the 
user and the virtual content within the space, as stated by [6]. Target selection permits the user to get or specify 
a target item in order to accomplish impending interactions on or with it [7] in which manipulation tasks are 
often followed by (and reliant on) selection tasks. As a consequence, poorly crafted selection procedures often 
impair overall user efficiency. Therefore, the target selection method is our main focus. This study addresses 
the issue of target selection, on the occluded and distant object in a handheld AR environment. This issue is 
important and often occurs during the target object's interaction. Despite the issues commonly discussed in the 
virtual reality (VR) environment, very less attention is taken to the issues to be discussed in AR environment. 
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Despite that, as AR is able to obey the VR principle, the target selection method proposed for VR environment 
is also applicable to be utilized in the AR environment, due to the nature of the target selection. Target selection 
on the occluded and distant object is often highlighted in previous research as an ongoing issue [8]-[11] when 
interacting with the object. In the case of selection on the occluded object (cluttered scene) either fully or 
partially occluded, the traditional methods may begin to lose their function in terms of precision and speed 
[12]. Meanwhile, the occluded target object being in distance from the user increases the difficulty for the 
target selection task to complete [9], [13], [14]. 

Despite these issues, the problem becomes worse as the conventional way of interaction for handheld 
devices of the touch input is often limited by the device’s physical boundary and usability suffers as the on-screen 
content becomes occluded by the finger during interaction [15]. Interaction using a paddle-based, controller, or 
joystick are among the other available method for interaction [16], [17]. However, the user usually needs time to 
adapt to the devices in order to perform a more accurate selection. As stated by Ismail and Sunar [18], there is a 
bottleneck that happens, which depends on user engagement due to the interaction’s artificial nature. In the hopes 
of removing this bottleneck, people began to study human forms such as gaze, speech, and gesture recognition to 
develop ways of communicating with machines naturally and intuitively [19]. Meanwhile, states that by using 
gesture, the interaction of natural, efficient, and intuitive activities may be achieved [2]. Therefore, we further 
discuss the approaches of previous researchers to attend to the highlighted issues above in the next section. 


2. RELATED WORKS 

AR is being explored more deeply, enabling many upgraded and new ways of selection to be explored 
and implemented by researchers to provide more precise and engaging user interaction. Table 1 shows the 
comparison of recent approaches taken by researchers in addressing the issue of selection on the target object. 
As stated by Poupyrev ef al. [20], there are two categories of selection techniques which are exocentric and 
egocentric. In the exocentric approach, the interaction is done by the user outside of the virtual environment. 
Exocentric approach can be further categorized into two categories which are world-in-miniature (WIM) and 
automatic scaling. Meanwhile, the egocentric approach is when the user operates directly from the virtual 
environment’s interior as if it were a part of it. However, since this approach is often employed for the precise 
manipulation of the object, it is less suitable to implement for large-scale manipulation tasks. Virtual hand and 
virtual pointer are two types of egocentric approaches. The virtual hand enables an isomorphic relationship 
between the actual and virtual hands [21]. Several works have utilized the virtual hand approach to interact 
with virtual elements [2], [22], [23]. The virtual pointer allows the selection of target items that are outside the 
user’s reach with less actual hand movement. Although the virtual pointer consists of two groups [24], we will 
only discuss related works where the selection ray starts at the user’s hand (e.g. raycasting and Go-Go [25]). 

Raycasting permits selection at a distance, although the object appears smaller due to the increase in 
the distance of the virtual object toward the user. The virtual targeted object is selected when collision between 
the ray and the bounding box of the 3D objects in the environment is detected. Raycasting implementations 
differ in many ways, including the manner in which the ray is managed. A point of origin and a direction are 
needed to control the ray [26]. 


Table 1. Comparison of recent approaches from related research on target selection 


Year Researchers Interaction metaphor proposed for target selection Domain ee eee 
2022 This study Gesture-based pointing with raycasting metaphor Handheld AR JV JV 
2022 Kapinus et al. [12] Touch-based pointing with raycasting metaphor Handheld AR JV - 
2021 = Lietal. [9] Controller-based pointing with mirror metaphor VR bf of 
2021 Messaci et al. [10] Gesture-based with zoom metaphor VR af of 
2020 = Qian et al. [27] Gesture-based metaphor Handheld AR - rf 
2020 = Yuetal. [28] Controller-based pointing metaphors VR of - 
2020 Sidenmark et al. [29]  Controller-based pointing with gaze-assisted metaphor VR JV - 
2019 = Yin et al. [30] Touch-based pointing metaphors Handheld AR JV - 
2018 Whitlock et al. [8] Gesture-based metaphor AR of - 
2017 Jung and Woo [13] Raycasting with target object duplication metaphor AR - JV 
2017 Bellarbi et al. [14] Gesture-based pointing with zooming metaphor AR - JV 
2016 Yuand Kim [31] Finger-pointing metaphor AR - JV 


In the study by Pouprev et al. [25], to highlight the issues of target selection on the distant object, they 
prosent in their study a technique, named Go-Go which allows for seamless direct manipulation of both near 
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and distant object. Moreover, Go-Go enables the extension of the user’s ability to reach faraway object for 
interaction by extending a ray with the virtual hand towards the object, however, it is implemented within VR 
environment. Several enhancements have been made to it by other researchers throughout the years. Jung and 
Woo [13] have enhanced the Go-Go interaction system, by enabling it to switch on and off with trigger motions 
for interaction within AR environment setup, between regular and boost modes using hand gesture. Figure 1 
shows that the ray is projected into a distance, where the virtual hand avatar works just like a real hand, allowing 
the selection of a distant virtual item, as illustrated in Figure 1 (a). 


(b) 


Figure 1. Projection of ray into the distance: (a) with virtual hand [13] and (b) with pointing raycast [32] 


Meanwhile, Yin et al. [30] have included Go-Go as the baseline technique to be compared with their 
proposed novels target selection techniques (refer to Figure 2). As the interaction takes place in a handheld AR 
interface, a virtual hand is assigned to point toward the desired target with a single touch on the screen, conduct 
swipe up and down movements to alter the arm reach (see Figure 2(a)), and press the confirmation button on 
the screen to choose the target that has intersected with the virtual hand, as shown in Figure 2(b). However, 
because it limits the intended target size with a point cursor, using the virtual hand to point at and touch the 
obstructed or tiny target is challenging. 


(a) (b) 


Figure 2. The selection process for Go-Go consists of two subtasks: (a) indicating and (b) the optional step of 
confirming the selection [30] 


Meanwhile, Olwal and Feiner [33], have presented a flexible pointer to address the occlusion issue 
for target selection. The target selection is done by pointing a ray cursor that can be bent to point to the user's 
desired target without passing through distractor targets. This method, however, requires the employment of 
two 6-degree of freedom devices to control the cursor, as well as the user specifying the 3D position of the 
intended target. Other than that, Hincapié-Ramos et al. [34] implemented raycasting (originating from the 
user’s chin) based on the corresponding rotation values obtained from the handheld controller’s inertial 
measurement unit (IMU) in an AR HMD display setup named as GyroWand. In certain implementations, the 
selection is activated as the ray intersects with the targeted object (e.g., Yusof et al. [32]) as shown in Figure 1(b). 
In other cases, the selection is activated by a secondary “commit” motion, such as hovering it over the target 
or clicking a button on a different controller [30]. Additionally, raycasting often needs a mechanism for 
disambiguation across multiple possible targets, especially in densely populated virtual environments. Another 
example is the first-generation Microsoft HoloLens device that employs a variant of the raycasting technique, 
in which a ray is cast from the centre of the device’s viewport (accomplished by moving the head). 
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In addition, raycasting often necessitates a disambiguation mechanism among multiple possible 
targets, particularly in densely populated virtual environments. A virtual ray is cast from the point of origin, 
normally a tracked hand or handheld joystick, along a specified pointing direction in the default 
implementation. The most common approach is to implement it with a button to validate the selection. Another 
important distinction is the ray’s form, which may either be a straight line in 3D space or have an aperture 
angle that basically configures a cone [21]. From Table 1, it can be seen that the approach to highlight the 
target selection issues are often discussed in the VR environment. Other than that, only a few researches are 
done to highlight the target selection issues on the virtual object with both conditions of occluded and distant, 
while none of the studies addressed both issues that were highlighted in handheld AR environment within their 
study. Therefore, this research aim is to propose a target selection method on the distant and occluded object 
in the handheld AR environment as another valuable contribution to the research community. 


3. METHOD 

The proposed target selection method is designed to address selection issues on the distant and 
occluded target object in handheld AR. In the proposed target selection method, the raycasting technique and 
hand gesture are utilized. There are several phases outlined to achieve the aim of this study which will be 
discussed in the next subsection. 


3.1. Handheld AR workspace setup 

In this study, a markerless tracking technique. SLAM technique is implemented to enable the AR 
environment in the handheld setup. The handheld AR workspace setup for this study is shown in Figure 3, 
where the user’s left hand is holding the handheld device while the other hand is available for the interaction. 
The handheld device is required to have a camera which will be used to view the environment and gyroscope 
function for the tracking purpose. We implement ARCore SLAM in which the visual information (feature 
points SLAM captured by the camera) is used in conjunction with inertial data from the device's IMU to 
determine the camera's location and orientation in relation to the world over time. It searches for feature point 
clusters that appeared to be on identical horizontal surfaces, such as tabletops and desks, to model both 
horizontal and vertical surfaces represented as 3D planes. It also provides the boundaries of the plane by 
detecting the edges in the camera images. By aligning the pose of the virtual camera that renders the 3D content 
with the pose of the device’s camera in the system, the virtual content is able to be rendered from the correct 
perspective. The more precisely the cameras are superimposed, the more credibly and more realistic the 
placement of the virtual object in the environment. The produced virtual images can be superimposed on top 
of the images collected by the device's camera, making the virtual content appear as a part of the actual 
environment [35]. 


Figure 3. Handheld AR workspace setup 


3.2. Enabling hand gesture tracking 

The leap motion device is implemented for its advantage of producing robust hand tracking [36]. 
Rather than setting the leap motion sensor on the table, for this study, it is attached to the back of the handheld 
device, enabling intuitive and natural AR experiences. Leap motion is able to track both hands simultaneously. 
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However, in this study, only one hand is tracked for the interaction which is the right hand. The gesture used 
is the dynamic gesture, as it contains movement and motion while performing the target selection. As the leap 
motion projects the 3D hand gestures in different coordinates compared to the ARCamera coordinates, it needs 
to undergo a calibration mapping process which we adapt from Kim and Lee [2]. 

Figure 4 shows the depth threshold and motion-based process as the hand is recognized. This standard 
tracking process produced the position and orientation to form a coordinate system as the hand inputs can be 
used to construct the interaction methods. As the user moves their hand, leap motion can continuously track 
the movement. The leap motion device processes these raw data to determine the hand’s skeletal calibration, 
which then is processed for skeletal hand tracking. Based on the raw data collected from the sensor device, it 
can then be used to develop features that allow the recognition of hand gestures. 


1 Hand Gesture 
I" I Recognition , Tracking Modelling 
* Two hands 
eas | ; 9 | * Depth Threshold o> © Position co> « Skeleton-based [=> Static & Dynamic 
: p—X. * Motion-based l e Orientation « Rigidbody Gesture 
Z a I 


Figure 4. Leap motion—the depth threshold and motion-based process 


3.3. Transferring gesture data in handheld 

The leap motion device cannot be directly connected to the handheld device due to the device’s 
limitations, as it is not designed for handheld purpose. Therefore, the gesture data that leap motion has captured 
needs to be sent over the network to the AR scene. The handheld device needs to ensure internet performance 
is in a stable bandwidth to make gesture movement smooth and robust. Otherwise, the data transmission can 
be delayed to obtain real-time hand movement if the network is weak. To accomplish this, a multiplayer 
networking is enabled in the system, adopted from the study by Nor’a et al. [37]. In this study, photon unity 
networking (PUN) is utilized to enable data transmission between devices where the attached leap motion 
device is connected to the Laptop (computer) to recognize the hand gesture and send the fingers’ position and 
orientation to the server. The system running on the handheld device receives the data from the server and 
handles the interactions. Figure 5 shows the flow of the data transfer process. 

The 3D hand orientation, gesture, and direction were obtained from leap motion which is connected 
to the computer. The system running on the computer acted as a Sender that transmit the position of fingertips 
to the PUN dedicated server. The system which was running on the handheld device, which acts as the client, 
then receives the hand data through the network from the server. This real-time synchronization via PUN 
enables the hand data from the leap motion device can be smoothly viewed on the client side through internet 
connectivity. Furthermore, it enables the user to further interact with the target object viewed through the 
handheld display using a hand gesture of the right hand. 


System at handheld receive hand data 


Send position of fingertips through input from Leap Motion. Position of 
network fright hand is gathered. 


Sender ie Server 


Leap Motion connected to the 
Laptop (computer) 
oe 


Figure 5. Gesture data transferred by sender and received at client 


Leap Motion device attached Client 


tohandheld (areata ] Handheld display AR 
‘Stee pee ' environment User 
' a perform gesture-based 
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interaction with the 
position of right hand 


3.4. Proposed target selection method 

After we study the related researches, the raycasting technique is proposed to be implemented with 
the real hand gesture for the target selection method in this study. The virtual object is selected when the 
collision between the ray and the bounding box of the 3D objects in the environment is detected, and the target 
distance range, and device tilt angle is met. Within the target selection method, it consists of two subtasks 
which are 1) indicating the target object and ii) confirming the target selection. 
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The first step to performing target selection on the virtual object is initiated when the ray projected, 
collides with the collider of the targeted object. The virtual targeted object is selected when the collision 
between the ray and the bounding box of the 3D objects in the environment is detected. In this study, the ray 
is not simply cast directly from a position on the user’s hand, instead, the handheld device position was utilized 
as a reference position, added an offset to approximate a shoulder position, and then projected a ray from the 
shoulder through the palm position and out towards the targeted object as shown in Figure 6. 


leap Hand Space ~ Ray projected with shoulder 
‘ . positioned as anchor 


Ray is casted through the shoulder (anchor) position and 
palm to trigger selection as it hit the targeted object. 


Tracked Surface 


Target object 


Figure 6. (Left side) illustration of the ray projected from offset shoulder as anchor, (right side) pinch gesture 


The last step, which is to confirm the target selection is done with pinch gesture (refer to Figure 6 (right 
side)). Pinch gestures are by far the most natural ways to interact with digital interfaces. It is equivalent to grabbing 
or picking and provides a natural cue for selecting or moving an item in an interactive system, ensuring accuracy 
and high efficiency. As the leap motion device detected the hand movement, the distance between the tip of the 
index finger and thumb finger is calculated. If the distance between the two is less than the pinch threshold value, 
the pinching state of the hand gesture is detected by the system [38]. This pinching state does not change anything 
if there is no target object selected (the ray is not intersecting any targeted object). 


4. RESULTS 

Although interaction with the virtual object includes selection and manipulation, in this study, we 
focus on target selection for the distant and occluded object in handheld AR. Figure 7 describes the experiment 
for the target selection method. As shown in Figure 7(a), the target selection is performed at three distance 
ranges, which are 0.6 meters, 1.45 meters, and 2.3 meters. The distance is between the centre of the handheld 
device and the occluded target object. The study by Qian ef al. [27] was used as a reference as interaction 
performed on smartphones were studied at two interaction depths (close-range and distant), which are very 
close to this study. Other than that, the target object is set to be occluded by 20%, 50% and 80% of occlusion 
levels as referred to the study by Yin et al. [30]. The occluded area of the targeted object is calculated with the 
formula of the surface area of a cube. We further limit the range of tilt angle of the device to be within 45° to 
90° for the target selection. We record the task completion time of the target selection task, to measure the 
performance of the target selection method. Figure 7(b) shows the flowchart of the target selection process. 
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Figure 7. Target selection experiment (a) workspace setup and (b) flowchart of the target selection method 
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The right hand is utilized to perform the target selection. The process starts with scanning the 
environment to find the real hand. Once the real hand is detected, the condition for target selection is to check 
whether the ray has collided with the target object. However, before the condition is checked, once the hand is 
detected, a ray is drawn and positioned at the user’s palm. Figure 8 shows the output of the target selection 
method. If the ray has collided with the target object, the target object is indicated for selection as shown in 
Figure 8 (a). Next, to confirm the selection, the distance between the fingertip of the thumb and index finger is 
calculated. Thus, the condition of the distance between thumb and index fingertips is less than the pinch 
threshold, which 0.4f is checked. The red cube is the occluded target object for selection, while the blue and 
yellow cube is the disruptor that occluded the target object in the scene. If a target object is indicated, the object 
initiates visual feedback (black coloured outline of the object and once the condition has been met, and pinch 
gesture is detected, visual feedback is given to the user (outline of the targeted object appears in green) as 
shown in Figure 8(a) and (b). It is equivalent to grabbing or picking and provides a natural cue for selecting or 
moving an item, ensuring accuracy and high efficiency. 


Pinch gesture 


(a) (b) 
Figure 8. Subtasks of the target selection method: (a) indicating target object and (b) confirming selection 


5. CONCLUSION 

This paper discussed the target selection method that we proposed using hand gesture-based with 
raycasting technique where the AR environment is enabled using markerless tracking technique in handheld 
setup. This study has addressed the issue of target selection on the distant and occluded object in handheld AR. 
Target selection is different from basic selection as basic selection can be made to any object; meanwhile, 
target selection is made to a targeted object with a certain condition. In particular, the proposed target selection 
method can be utilized in various fields that require interaction such as games, simulations, training, and also 
for educational purposes. The proposed target selection method able to perform the target selection with 
markerless tracking technique on the distant and occluded object in handheld AR is the main contribution. The 
tracking to enable AR is achieved using the SLAM tracking technique; user is required to scan their 
surroundings for plane surfaces. The process continues by enabling the hand gesture tracking process in the 
system where the hand data is transmitted through internet networking to enable the target selection process. 
The target selection process is designed with two subtasks. Firstly, the user will indicate the targeted object 
using raycast, by pointing the ray towards the targeted object. To complete the target selection, the user needs 
to pose a pinch gesture. To test its functionality, the time taken to perform the target selection is taken. 

Therefore, for future enhancement to this study, we will conduct a complete experiment to further 
discuss on the performance of the proposed target selection method when performed on the occluded and 
distant object in handheld AR. A system usability scale (SUS) questionnaire will also be conducted after the 
user has completed the experiment. However, in future work, there are a few limitations that could be addressed 
by other researchers in their study. Firstly, the target selection method proposed in this study which implements 
gesture-based interaction is enabled without taking the occlusion issue for the real hand gesture into 
consideration. Thus, while performing selection, the virtual object augmented in the handheld AR environment 
always appears above the real hand gesture although the actual position for the virtual objects is behind the real 
hand gesture position. When performing interaction with real hand gesture recognition, the occlusion issue is 
an inevitable problem that needs improvement. Secondly, further study of the target selection method that is 
established for collaborative setup between two or more users would potentially help to support complex 
collaboration for professional use when interacting with a virtual object in AR. Other than that, it is suggested 
for future researchers to explore the adoption of different types of gestures or add speech for target selection 
purposes as in this study, the pinch gesture is enabled for the target selection method. 
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